candidate sequence
- North America > United States (0.08)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Beijing > Beijing (0.04)
ProSpero: Active Learning for Robust Protein Design Beyond Wild-Type Neighborhoods
Kmicikiewicz, Michal, Fortuin, Vincent, Szczurek, Ewa
Designing protein sequences of both high fitness and novelty is a challenging task in data-efficient protein engineering. Exploration beyond wild-type neighborhoods often leads to biologically implausible sequences or relies on surrogate models that lose fidelity in novel regions. Here, we propose ProSpero, an active learning framework in which a frozen pre-trained generative model is guided by a surrogate updated from oracle feedback. By integrating fitness-relevant residue selection with biologically-constrained Sequential Monte Carlo sampling, our approach enables exploration beyond wild-type neighborhoods while preserving biological plausibility. We show that our framework remains effective even when the surrogate is misspecified. ProSpero consistently outperforms or matches existing methods across diverse protein engineering tasks, retrieving sequences of both high fitness and novelty.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > France (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (4 more...)
- North America > United States (0.08)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Beijing > Beijing (0.04)
SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding
Walton, Thomas, Tsui, Darin, Musharaf, Aryan, Aghazadeh, Amirali
Autoregressive models have transformed protein engineering by enabling the generation of novel protein sequences beyond those found in nature. However, their sequential inference introduces significant latency, limiting their utility in high-throughput protein screening. Speculative decoding accelerates generation by employing a lightweight draft model to sample tokens, which a larger target model then verifies and refines. Yet, in protein sequence generation, draft models are typically agnostic to the structural and functional constraints of the target protein, leading to biologically implausible outputs and a shift in the likelihood distribution of generated sequences. We introduce SpecMER (Speculative Decoding via k-mer Guidance), a novel framework that incorporates biological, structural, and functional priors using k-mer motifs extracted from multiple sequence alignments. By scoring candidate sequences in parallel and selecting those most consistent with known biological patterns, SpecMER significantly improves sequence plausibility while retaining the efficiency of speculative decoding. SpecMER achieves 24-32% speedup over standard autoregressive decoding, along with higher acceptance rates and improved sequence likelihoods.
- North America > United States (0.28)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Biomedical Informatics (0.66)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
SpecMemo: Speculative Decoding is in Your Pocket
Recent advancements in speculative decoding have demonstrated considerable speedup across a wide array of large language model (LLM) tasks. Speculative decoding inherently relies on sacrificing extra memory allocations to generate several candidate tokens, of which acceptance rate drives the speedup. However, deploying speculative decoding on memory-constrained devices, such as mobile GPUs, remains as a significant challenge in real-world scenarios. In this work, we present a device-aware inference engine named SpecMemo that can smartly control memory allocations at finer levels to enable multi-turn chatbots with speculative decoding on such limited memory devices. Our methodology stems from theoretically modeling memory footprint of speculative decoding to determine a lower bound on the required memory budget while retaining speedup. SpecMemo empirically acquires a careful balance between minimizing redundant memory allocations for rejected candidate tokens and maintaining competitive performance gains from speculation. Notably, with SpecMemo's memory management, we maintain 96% of overall throughput from speculative decoding on MT-Bench, with reduced generation-memory by 65% on single Nvidia Titan RTX. Given multiple constrained GPUs, we build on top of previous speculative decoding architectures to facilitate big-model inference by distributing Llama-2-70B-Chat model, on which we provide novel batched speculative decoding to increase usability of multiple small server GPUs. This novel framework demonstrates 2x speedup over distributed and batched vanilla decoding with the base model on eight AMD MI250 GPUs. Moreover, inference throughput increases remarkably 8x with batch size 10. Our work contributes to democratized LLM applications in resource-constrained environments, providing a pathway for faster and cheaper deployment of real-world LLM applications with robust performance.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > Middle East > Jordan (0.04)
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
Wen, Zhuofan, Gui, Shangtong, Feng, Yang
Inference acceleration of large language models (LLMs) has been put forward in many application scenarios and speculative decoding has shown its advantage in addressing inference acceleration. Speculative decoding usually introduces a draft model to assist the base LLM where the draft model produces drafts and the base LLM verifies the draft for acceptance or rejection. In this framework, the final inference speed is decided by the decoding speed of the draft model and the acceptance rate of the draft provided by the draft model. Currently the widely used draft models usually generate draft tokens for the next several positions in a non-autoregressive way without considering the correlations between draft tokens. Therefore, it has a high decoding speed but an unsatisfactory acceptance rate. In this paper, we focus on how to improve the performance of the draft model and aim to accelerate inference via a high acceptance rate. To this end, we propose a CTC-based draft model which strengthens the correlations between draft tokens during the draft phase, thereby generating higher-quality draft candidate sequences. Experiment results show that compared to strong baselines, the proposed method can achieve a higher acceptance rate and hence a faster inference speed.
- North America > United States (0.09)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Beijing > Beijing (0.04)
Truthful Aggregation of LLMs with an Application to Online Advertising
Soumalias, Ermis, Curry, Michael J., Seuken, Sven
Online platforms generate hundreds of billions of dollars in revenue per year by showing advertisements alongside their own content. Currently, these platforms are integrating Large Language Models (LLMs) into their services. This makes revenue generation from LLM-generated content the next major challenge in online advertising. We consider a scenario where advertisers aim to influence the responses of an LLM to align with their interests, while platforms seek to maximize advertiser value and ensure user satisfaction. We introduce an auction mechanism for this problem that operates without LLM fine-tuning or access to model weights and provably converges to the output of the optimally fine-tuned LLM for the platform's objective as computational resources increase. Our mechanism ensures that truthful reporting is a dominant strategy for advertisers and it aligns each advertiser's utility with their contribution to social welfare - an essential feature for long-term viability. Additionally, it can incorporate contextual information about the advertisers, significantly accelerating convergence. Via experiments with a publicly available LLM, we show that our mechanism significantly boosts advertiser value and platform revenue, with low computational overhead. While our motivating application is online advertising, our mechanism can be applied in any setting with monetary transfers, making it a general-purpose solution for truthfully aggregating the preferences of self-interested agents over LLM-generated replies.
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Marketing (1.00)
- Information Technology > Services (0.90)
- Education > Educational Setting > Online (0.46)
Generative AI-Based Text Generation Methods Using Pre-Trained GPT-2 Model
Pandey, Rohit, Waghela, Hetvi, Rakshit, Sneha, Rangari, Aparna, Singh, Anjali, Kumar, Rahul, Ghosal, Ratnadeep, Sen, Jaydip
A text generation model is a machine learning model that uses neural networks, especially transformers architecture to generate contextually relevant text based on linguistic patterns learned from extensive corpora. The models are trained on a huge amount of textual data so that they can model and learn complex concepts of any language like its grammar, vocabulary, phrases, and styles. Text generation models can increase the productivity of humans in their current business processes. These models are already automating the process of content creation across industries for the generation of reports, summaries, and emails among others. These models are also allowing for a greater level of personalization in communications between businesses and their customers.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > India > West Bengal > Kolkata (0.04)
- (6 more...)
- Research Report > Promising Solution (0.67)
- Research Report > New Finding (0.46)
Extending Conformal Prediction to Hidden Markov Models with Exact Validity via de Finetti's Theorem for Markov Chains
Nettasinghe, Buddhika, Chatterjee, Samrat, Tipireddy, Ramakrishna, Halappanavar, Mahantesh
Conformal prediction is a widely used method to quantify the uncertainty of a classifier under the assumption of exchangeability (e.g., IID data). We generalize conformal prediction to the Hidden Markov Model (HMM) framework where the assumption of exchangeability is not valid. The key idea of the proposed method is to partition the non-exchangeable Markovian data from the HMM into exchangeable blocks by exploiting the de Finetti's Theorem for Markov Chains discovered by Diaconis and Freedman (1980). The permutations of the exchangeable blocks are viewed as randomizations of the observed Markovian data from the HMM. The proposed method provably retains all desirable theoretical guarantees offered by the classical conformal prediction framework in both exchangeable and Markovian settings. In particular, while the lack of exchangeability introduced by Markovian samples constitutes a violation of a crucial assumption for classical conformal prediction, the proposed method views it as an advantage that can be exploited to improve the performance further. Detailed numerical and empirical results that complement the theoretical conclusions are provided to illustrate the practical feasibility of the proposed method.
- Asia > India (0.04)
- North America > United States > Iowa > Johnson County > Iowa City (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (3 more...)